{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "RDataScience.ipynb",
      "version": "0.3.2",
      "provenance": [],
      "collapsed_sections": []
    },
    "language_info": {
      "codemirror_mode": "r",
      "file_extension": ".r",
      "mimetype": "text/x-r-source",
      "name": "R",
      "pygments_lexer": "r",
      "version": "3.4.4"
    },
    "kernelspec": {
      "display_name": "R",
      "language": "R",
      "name": "ir"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SI8GJ8AIlYOk",
        "colab_type": "text"
      },
      "source": [
        "# R practical for Data Science\n",
        "\n",
        "\n",
        "---\n",
        "\n",
        "## Introduction:\n",
        "\n",
        "This guide examines in more detail the good practices and basics that will allow you to analyze data using R. In this guide, you will learn how to use Jupyter notebooks and libraries to explore and analyze your data in a straightforward, clear and transparent manner.\n",
        "\n",
        "## R Language:\n",
        "\n",
        "R is a programming language for analyzing and displaying statistical and graphical data. The central part of R is an interpretive computational language that allows generation and looping as well as modular programming using functions. R can be combined with procedures written in C, C++, .Net, Python or FORTRAN for more efficiency.\n",
        "\n",
        "## Features of R \n",
        "\n",
        "The most important features of R are:\n",
        "\n",
        "* R is a very advanced, simple and powerful programming language encompassing conditions, loops, recursive functions and customizable input/output functions. \n",
        "* R has efficient data processing and storage capacity. \n",
        "* R offers a suite of operators to perform calculations on tables, lists, vectors and matrices. \n",
        "* R provides a comprehensive and coherent set of tools for data analysis. \n",
        "* R provides graphical tools for data analysis and display directly on the computer.\n",
        "\n",
        "## Data Types\n",
        "\n",
        "The most frequently used structures are: \n",
        "\n",
        "* Scalars \n",
        "* Vectors \n",
        "* Lists\n",
        "* Matrices \n",
        "* Tables\n",
        "* Factors\n",
        "\n",
        "\n",
        "### 1. Scalars :\n",
        "\n",
        "Scalars can be an integer, real, logical or string type. The objects are assigned to values via the operators` < - or =.`\n",
        "`Is() function` is used to list the variables of the workspace, and `rm() function` allows to delete one or more variables (objects).\n",
        "\n",
        "**As examples:**\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "a_aEPkBEqk49",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "2+2 #is the sum of two integers\n",
        "exp(10) #gives the exponential of 10\n",
        "a = log(2) #assign log(2) to object a\n",
        "b <- cos(10) #assign cos(10) to object b\n",
        "a+b # give the sum of two objects a and b\n",
        "a # display the value of a\n",
        "b = 2 # assign the value \"2\" to object b\n",
        "ls() # list the objects already created\n",
        "rm(a) # delete the object already created \"a"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8njhfhnxq4ki",
        "colab_type": "text"
      },
      "source": [
        "###2. Vectors : \n",
        "\n",
        "To create a vector with more than one item, you must use  `c () function` which consists in combining items in a vector. \n",
        "\n",
        "**As examples:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ciTS9TnMraF6",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create a vector.\n",
        "Color <- c('red','green','blue')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "tjVTKPCZ1Rq-",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "weight = c(13.5, 20.7, 30.5, 38.1, 41.5) #the weight vector\n",
        "weight # displays the entered values\n",
        "length(weight) # counts the number of measurements\n",
        "order(weight) # orders the values entered\n",
        "sum(weight) # returns the sum of the elements\n",
        "min(weight) # Min of weight\n",
        "max(weight) # Max of weight\n",
        "prod(weight) # product of weight\n",
        "weightskg = weight * 0.001; weightskg # converts the measurements into kg\n",
        "weight[3] # value at position 3\n",
        "weight[c(1,3,5)] # the values in first, third and fifth position.\n",
        "average = sum(weight)/5 # the average\n",
        "median = median(weight) #the median\n",
        "variance = var(weight) #the variance\n",
        "seq1 = 1:12 #generates a series of 1 to 12\n",
        "seq2 = seq(-5,3,by=1); # display -5 -4 -3 -2 -1 0 1 2 3\n",
        "seq3= seq(1,2,by=0.1) # displays 1.0 1.1 1.1 1.2 1.3 1.3 1.4 1.5 1.6 1.7 1.8 1.8 1.9 2.0\n",
        "seq4= seq(10.00000,50.00000,by=4.4444444)\n",
        "fix(weight) # opens a window or corrects the value\n",
        "weight # to check the correction\n",
        "# the weeks corresponding to the weights:\n",
        "time = c(1,2,2,3,3,4,5) #creates the time vector\n",
        "time # displays its content\n",
        "time = seq(1,5,by=1); time # creates the time vector but with another way using\n",
        "\"seq\"\n",
        "rev(time) #reverses the ordr of the sequence\n",
        "rep(time,3) # repeats the sequence 3 times\n",
        "sum(time) #Sum of the sequence\n",
        "length(time) #sequence length\n",
        "name = names(time)[1:5] <- c(\"week1\", \"week2\", \"week3\", \"week4\", \"week5\"); name: #renames the elements of the time vector"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SeYzxWHArdGD",
        "colab_type": "text"
      },
      "source": [
        "### 3. Lists : \n",
        "\n",
        "A list is an R object that contains many different elements types including vectors, functions and even another list.\n",
        "\n",
        "**As example:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "N61mue05r7es",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create a list of items.\n",
        "list1 <- list(c(2,5,3),21.3)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "I_4x9Nhyr81s",
        "colab_type": "text"
      },
      "source": [
        "### 4. Matrices : \n",
        "\n",
        "A matrix is a 2-dimensional dataset. It can be created using a vector input to the matrix function.\n",
        "\n",
        "**As examples:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-I-lsJGHsVif",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create a matrix with 3 columns and 2 rows.\n",
        "M = matrix('c('a','a','b','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "8IBh8sVp16nX",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "x<-seq(1:5) #creates a sequence named x\n",
        "y<-x*2 #creates an object y\n",
        "cbind(x,y) #manipulates the vectors to form a matrix\n",
        "xy<-rbind(x,y) #manipulates vectors to form a matrix\n",
        "xy\n",
        "matrix(1:20, nrow=5, byrow=T) #creates a matrix of 20 elements with a number of lines of 5"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G1dDF0Qb2F5p",
        "colab_type": "text"
      },
      "source": [
        "Now, assuming that the text file \"student.txt\" located in C:/ contains these students :\n",
        "\n",
        "| sex | weight   | size |  \n",
        "|------|------|------|------|\n",
        "|   f  |55 | 166\n",
        "|   f  |53 | 135\n",
        "|   m  |56 | 169\n",
        "|   f  |55 | 161\n",
        "|   m |56 | 187\n",
        "|   f  |67 | 166\n",
        "|   f  |67 | 169\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "nHmjotiZ32YN",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "study <- read.table(\"C:/student.txt\", header = TRUE) \n",
        "# Read the student file.txt \n",
        "study[1:5, ] \n",
        "# Display the table \n",
        "size <- study[, \"taile\"] # displays the size of students \n",
        "sex <- study[,\"sex\"] \n",
        "# Display the sex of students\n",
        "tf <- etud[etud$sexe===\"f\",\"size\"] \n",
        "# display the size of all \"f\" \n",
        "tf"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Gi_2fb37sXiC",
        "colab_type": "text"
      },
      "source": [
        "### 5. Tables :\n",
        "\n",
        "Matrices are limited to 2D, while those can be of any number of dimensions. The `array () function` takes a `dim` parameter that creates the required number of dimensions.\n",
        "\n",
        "**As example:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Ioj99Nkws-uU",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create an array that contains 2 elements and has a size of 3*3.\n",
        "a <- array(c('red','blue),dim = c(3,3,2))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6hq7urBWtEJ1",
        "colab_type": "text"
      },
      "source": [
        "### 6. Factors :\n",
        "\n",
        "Factors are R objects that are created using a vector. It stores this vector with different values of items as labels. Labels are usually characters, whether numeric, character or Boolean. It is useful in statistical analysis.\n",
        "\n",
        "**As example: **"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "VEgexjWZuAdL",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create a vector.\n",
        "color <- c('red','green','blue','grey','black','white')\n",
        "# Create a Factor object.\n",
        "Color_factor<- factor(color)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7x8L1BucuDZq",
        "colab_type": "text"
      },
      "source": [
        "## R Software \n",
        "\n",
        "The R software is a freeware available on this  [website](http://cran.r-project.org/) .\n",
        "\n",
        "There are 3  versions: [Windows](https://cran.r-project.org/bin/windows/), [MacOS](https://cran.r-project.org/bin/macosx/) and [Linux](https://cran.r-project.org/bin/linux/).\n",
        "\n",
        "![](http://pndar.ir/wp-content/uploads/2019/02/rlogo-382x226.jpg)\n",
        "\n",
        "The available options are:\n",
        "\n",
        "* An object-oriented programming language\n",
        "* Basic functions\n",
        "* Additional libraries/packages (1800 on the [CRAN site](http://cran.r-project.org/))\n",
        "\n",
        "To use the R help, you can type the following commands on the editor:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Pf36P5fTv3cS",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "help (\"rm\") #Get help on the usefulness of the rm() function\n",
        "help . search (\"rm\")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZrtUKMOewIJy",
        "colab_type": "text"
      },
      "source": [
        "### 1. Basic operations\n",
        "\n",
        "The basic  operations on scalars are: `*, -, /, +, ˆ.`\n",
        "As already mentioned above, the assignment of objects to values is done via  operators:  `< - or =.`\n",
        "\n",
        "**As examples:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "oJwy8_bAws1j",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# create a variable a which contains the value 42\n",
        "a <- 42\n",
        "\n",
        "# The content of the variable is displayed\n",
        "a\n",
        "# Change the content of the variable\n",
        "a <- 8\n",
        "# Display its content\n",
        "a\n",
        "# The assignment also works in the other direction\n",
        "5 -> a\n",
        "a\n",
        "5 -> b\n",
        "a+b >a+b"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "BTf8AZKH0OuN",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "11*13 # a multiplication operation\n",
        "pi # R gives the value of pi\n",
        "x = 2*3 # stores the result in x without displaying it\n",
        "x # displays the value of x\n",
        "log(x)/x # calculation with x\n",
        "y <- 13 # y<-13 is equivalent to y=13\n",
        "x*x^y # exponent calculation\n",
        "ls() # displays the names of the created variables\n",
        "5*3+4 # is not the same calculation as 5*(3+4)\n",
        "5*(3+4)\n",
        "x*x^y # is not the same calculation as (x*x)^y\n",
        "(x*x)^y"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VncVsJBK0aNP",
        "colab_type": "text"
      },
      "source": [
        "In R, the logical comparison operators are: ==, <, >, <=, >=, !=.  The output result is either true (T=True) or false (F=False) :\n",
        "\n",
        "**As example:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "xCreoE1s0_Y_",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "3 < 5 # the answer is True\n",
        "x > y # it's False\n",
        "(x+7) == y # it's True"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "zJg7OHNUxSPs",
        "colab_type": "text"
      },
      "source": [
        "In the previous examples, we have created numeric variables because we have assigned numbers to them. As shown below, you can also assign a string to it.\n",
        "\n",
        "**As examples:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "XO4NOrhzxceL",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# assign a character string to the variable a\n",
        "a <-'Virgilio'\n",
        "#Display its content\n",
        "a"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "B10qWa32xsIH",
        "colab_type": "text"
      },
      "source": [
        "We will now see some functions applicable to strings. In R, the concatenation is done thanks to the ` paste() function.`\n",
        "\n",
        "**As example:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "kHfO8r73x2sk",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# create variables containing the information\n",
        "age <- 3\n",
        "name <- 'Virgilio'\n",
        "# The paste() function is called by giving it the different items of the final sentence in the following order\n",
        "paste('Hello my name is', name,'and I am', age,'months', sep=' ')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "kN3e_P6_yL1g",
        "colab_type": "text"
      },
      "source": [
        "The `nchar() function `allows to count the number of letters in a character string as shown in the following example:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "_fHJ-7PlyRtQ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# This function allows to count the number of characters and spaces\n",
        "nchar(\"Virgilio\")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Sde37ySEyYut",
        "colab_type": "text"
      },
      "source": [
        "The two functions `toupper() and tolower()` respectively are used to transform the given character string into an argument either all in upper case (`toupper() function`) or all in lower case (`tolower() function`).\n",
        "\n",
        "**As example:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UyICoqPJyre9",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "toupper (\"ViRGilio\")\n",
        "tolower(\"virGilio\")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eZkJSuV1yyJT",
        "colab_type": "text"
      },
      "source": [
        "Now we will  discover a number of functions that will allow to perform simple mathematical operations.\n",
        "\n",
        "**As examples:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MeYq_Jz2y7Yw",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# The floor function refers to the lower integer\n",
        "floor(2.4)\n",
        "# The ceiling function returns the upper integer\n",
        "ceiling(2.4)\n",
        "# The round function rounds to the nearest integer\n",
        "round(2.4)\n",
        "round(2.6)\n",
        "# The cos function\n",
        "cos(90)\n",
        "# The Sin function\n",
        "sin(90)\n",
        "# The tangent function\n",
        "tan(90)\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0EgxOCIJzVdB",
        "colab_type": "text"
      },
      "source": [
        "We will now see how to manipulate  vectors.\n",
        "Note that the `vector() function` allows  to create a vector.\n",
        "\n",
        "**As examples:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ilhPabk2zg5U",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create a vector containing 10 numerical elements\n",
        "vector(\"numeric\", 10)\n",
        "# For example, you can create a vector containing character strings of length 5\n",
        "vector(\"character\", 5)\n",
        "# Create of a vector containing 8 logical elements (Boolean, default value FALSE)\n",
        "vector(\"logical\", 8)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rDPqu9hGzsN9",
        "colab_type": "text"
      },
      "source": [
        "The `scan() function` is used to type items on the keyboard.\n",
        "\n",
        "**As example:**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "skvp5LYKzzp9",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create a vector of size 3 using the scan() function\n",
        "scan(nmax=3)"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}